Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence
نویسندگان
چکیده
We derive sharp thresholds for exact recovery of communities in a weighted stochastic block model, where observations are collected in the form of a weighted adjacency matrix, and the weight of each edge is generated independently from a distribution determined by the community membership of its endpoints. Our main result, characterizing the precise boundary between success and failure of maximum likelihood estimation when edge weights are drawn from discrete distributions, involves the Renyi divergence of order 1 2 between the distributions of within-community and between-community edges. When the Renyi divergence is above a certain threshold, meaning the edge distributions are sufficiently separated, maximum likelihood succeeds with probability tending to 1; when the Renyi divergence is below the threshold, maximum likelihood fails with probability bounded away from 0. In the language of graphical channels, the Renyi divergence pinpoints the information-theoretic capacity of discrete graphical channels with binary inputs. Our results generalize previously established thresholds derived specifically for unweighted block models, and support an important natural intuition relating the intrinsic hardness of community estimation to the problem of edge classification. Along the way, we establish a general relationship between the Renyi divergence and the probability of success of the maximum likelihood estimator for arbitrary edge weight distributions. Finally, we discuss consequences of our bounds for the related problems of censored block models and submatrix localization, which may be seen as special cases of the framework developed in our paper.
منابع مشابه
Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms
New phase transition phenomena have recently been discovered for the stochastic block model, for the special case of two non-overlapping symmetric communities. This gives raise in particular to new algorithmic challenges driven by the thresholds. This paper investigates whether a general phenomenon takes place for multiple communities, without imposing symmetry. In the general stochastic block ...
متن کاملA Sharp Sufficient Condition for Sparsity Pattern Recovery
Sufficient number of linear and noisy measurements for exact and approximate sparsity pattern/support set recovery in the high dimensional setting is derived. Although this problem as been addressed in the recent literature, there is still considerable gaps between those results and the exact limits of the perfect support set recovery. To reduce this gap, in this paper, the sufficient con...
متن کاملExploiting Tradeoffs for Exact Recovery in Heterogeneous Stochastic Block Models
The Stochastic Block Model (SBM) is a widely used random graph model for networks with communities. Despite the recent burst of interest in community detection under the SBM from statistical and computational points of view, there are still gaps in understanding the fundamental limits of recovery. In this paper, we consider the SBM in its full generality, where there is no restriction on the nu...
متن کاملCommunity detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community d...
متن کاملInformation-theoretic Limits for Community Detection in Network Models
We analyze the information-theoretic limits for the recovery of node labels in several network models, including the stochastic block model, as well as the latent space model. For the stochastic block model, the non-recoverability condition depends on the probabilities of having edges inside a community, and between different communities. For the latent space model, the non-recoverability condi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1509.06418 شماره
صفحات -
تاریخ انتشار 2015